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Abstract 



O 

^ People's personal social networks are big and cluttered, and currently there is no 

O good way to automatically organize them. Social networking sites allow users to man- 

ually categorize their friends into social circles (e.g. 'circles' on Google+, and 'lists' on 
Facebook and Twitter), however they are laborious to construct and must be updated 
whenever a user's network grows. In this paper, we study the novel task of auto- 
matically identifying users' social circles. We pose this task as a multi-membership 
node clustering problem on a user's ego-network, a network of connections between her 

C/^ friends. We develop a model for detecting circles that combines network structure as 

well as user profile information. For each circle we learn its members and the circle- 

^ O ^ specific user profile similarity metric. Modeling node membership to multiple circles 

allows us to detect overlapping as well as hierarchically nested circles. Experiments 
I show that our model accurately identifies circles on a diverse set of data from Facebook, 

Googlc+, and Twitter, for all of which we obtain hand-labeled ground-truth. 

oo 

^ 1 Introduction 

Online social networks allow us to follow streams of posts generated by hundreds of our 
friends and acquaintances. The people we follow generate overwhelming volumes of infor- 
mation and to cope with the 'information overload' we need to organize our personal social 



> networks ( [Agarwal et al.| |2008t [Chen and Kargerf |2006[ |E1-Arini et al.||2009) . One of the 

^ main mechanisms for users of social networking sites to organize their networks and the 

^ content generated by them is to categorize their friends into what we refer to as social cir- 

^ cles. Practically all major social networks provide such functionality, for example, 'circles' 

on Google-|-, and 'lists' on Facebook and Twitter. Once a user creates her circles, they can 
be used for content filtering, for privacy, and for sharing groups of users that others may 
wish to follow. 

Examples of circles from a user's personal social network are shown in Figure [T| The 
'owner' of such a network (the 'ego') may form circles based on common bonds and at- 
tributes between themselves and the users whom they follow. In this example, the ego 
may wish to share their latest TKDD article only with their friends from the computer 
science department, while their baby photos should be shared only with their immediate 
family; similarly, they may wish to limit the amount of content generated by their high- 
school friends. These are precisely the types of functionality that circles are intended to 
facilitate. 
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friends under the same advisor 




highschool friends 



Figure 1: An ego- network with labeled circles. The central user, the 'ego', is friends with 
all other users (the 'alters') in the network. Alters may belong to any number of circles, 
including none. We aim to discover circle memberships and to find common properties 
around which circles form. This network shows typical behavior that we observe in our data: 
Approximately 25% of our ground-truth circles (from Facebook) are contained completely 
within another circle, 50% overlap with another circle, and 25% of the circles have no 
members in common with any other circle. 

Currently, users in Facebook, Google+ and Twitter identify their circles either man- 
ually, or in a naive fashion by identifying friends sharing a common feature. Neither 
approach is particularly satisfactory: the former is time consuming and does not update 
automatically as a user adds more friends, while the latter fails to capture individual as- 
pects of users' communities, and may function poorly when profile information is missing 
or withheld. 

In this paper we study the problem of automatically discovering users' social circles. 
In particular, given a single user with her personal social network, our goal is to identify 
her circles, each of which is a subset of her friends. 

Circles are user-specific as each user organizes her personal network of friends indepen- 
dently of all other users to whom she is not connected. This means that we can formulate 
the problem of circle detection as a clustering problem on her ego-network, the network of 
friendships between her friends. In practice, circles may overlap (a circle of friends from 
the same hometown may overlap with a circle from the same college), or be hierarchically 
nested (among friends from the same college there may be a denser circle from the same 
degree program). We design our model with both types of behavior in mind. 

In Figure [T] we are given a single user u and we form a network between her friends Vi . 
We refer to the user u as the ego and to the nodes Vi as alters. The task is then to identify 
the circles to which each alter Vi belongs, as in Figure [l} In other words, the goal is to find 
communities/clusters in u's ego- network. 

Generally, there are two useful sources of data that help with this task. The first is the 



2 



set of edges of the ego-network. We expect that circles are formed by densely-connected 



sets of alters (Newman, 2006). However, different circles overlap heavily, i.e., alters belong 



to multiple circles simultaneously (Ahn et al. , 2010 Palla et al. , 2005), and many circles 



are hierarchically nested in larger ones (as in Figure [ij). Thus it is important to model an 
alter's memberships to multiple circles. Secondly, we expect that each circle is not only 



densely connected but that its members also share common properties or traits (Mislove 



et al. , 2010). Thus we need to explicitly model the different dimensions of user profiles 



along which each circle emerges. 

We model circle affiliations as latent variables, and similarity between alters as a func- 
tion of common profile information. We propose an unsupervised method to learn which 
dimensions of profile similarity lead to densely linked circles. After developing a model 
for this problem, we then study the related problems of updating a user's circles once new 
friends are added to the network, and using weak supervision from the user in the form 
of 'seed nodes' to improve classification. For the former problem, we show that given an 
already-defined set of a users' circles, we can accurately predict to which circles a new user 
should be assigned. For the latter problem, we show that classification accuracy improves 
for each seed node that a user provides, though substantial improvements in accuracy are 
already obtained even with 2-3 seeds. 



Our model has two innovations: First, in contrast to mixed-membership models ( Airoldi 



et al. 2008) we predict hard assignment of a node to multiple circles, which proves critical 



for good performance (Gregory, 2010b). Second, by proposing a parameterized definition 



of profile similarity, we learn the dimensions of similarity along which links emerge (Feld 



1981 Simmel 1964). This extends the notion of homophily (Lazarsfeld and Merton 1954 



McPherson et al. , 2001 ) by allowing different circles to form along different social dimen- 
sions, an idea related to the concept of Blau spaces (McPherson, 1983). We achieve this 



by allowing each circle to have a different definition of profile similarity, so that one circle 
might form around friends from the same school, and another around friends from the 
same location. We learn the model by simultaneously choosing node circle memberships 
and profile similarity functions so as to best explain the observed data. 

We introduce a dataset of 1,143 ego-networks from Facebook, Google-|-, and Twitter, 
for which we obtain hand-labeled ground-truth from 5,636 circles. Experimental results 
show that by simultaneously considering social network structure as well as user profile 
information our method performs significantly better than natural alternatives and the 
current state-of-the-art. Besides being more accurate our method also allows us to generate 
automatic explanations of why certain nodes belong to common communities. Our method 
is completely unsupervised, and is able to automatically determine both the number of 
circles as well as the circles themselves. We show that the same model can be adapted to 
deal with weak supervision, and to update already-complete circles as new users arrive. 



A preliminary version of this article appeared in McAuley and Leskovec (2012). 



1.1 Further Related Work 

Although a 'circle' is not precisely the same as a 'community', our work broadly falls un- 



der the umbrella of community detection ( Lancichinetti and Fortunato, 2009a Schaeffer 



2007[ [Leskovec et al.| |2010t [Porter et al.| |2009 Newman 2004|). While 'classical' cluster- 
ing algorithms assume disjoint communities (Schaeffer, 2007), many authors have made 
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the observation that communities in real- world networks may overlap ( Lancichinetti and 


Fortunato 


2009b 


Gregory 


2010a 


Lancichinetti et al. 


2009 


Yang and Leskovec 


2012 


), or 



have hierarchical structure ( Ravasz and Barabasi 2003 ) . 

Topic-modeling techniques have been used to uncover 'mixed- memberships' of nodes 
to multiple groups, and extensions allow entities to be attributed with text information. 



Airoldi et al. (2008) modeled node attributes as latent variables drawn from a Dirichlet 



distribution, so that each attribute can be thought of as a partial membership to a com- 
munity. Other authors extended this idea to allow for side-information associated with the 



nodes and edges (Balasubramanyan and Cohen 2011 Chang and Blei 2009 Liu et al. 



2009). A related line of work by Hoff et al. (2002) also used latent node attributes to 



model edge formation between 'similar' users, which they adapted to clustering problems 
in 



Handcock et al. (2007b) and Krivitsky et al. (2009). 



Classical clustering algorithms tend to identify communities based on node features 



(Johnson, 1967) or graph structure ( Ahn et al. 2010 Palla et al. , 2005), but rarely use both 
in concert. Our work is related to Yoshida (2010) in the sense that it performs clustering 
on social- network data, and Streich et al. (2009), which models memberships to multiple 
communities. Another work closely related to ours is Yang and Leskovec (2012), which 
explicitly models hard memberships of nodes to multiple overlapping communities, though 
it does so purely based on network information rather than node features. Our inference 
procedure is also similar to that of Hastings (2006), which treats nodes' assignments to 
communities as a Maximum a Posteriori inference problem between a set of interdependent 
variables. 

Finally, [Chang et alT] ( |2009D ; [Menon and Elkan| ( |2011[ |2010D and |Vu et al.| ( |201lD model 
network data similar to ours; like our own work, they model the probability that two nodes 
will form an edge, though the underlying models do not form communities, so they are not 
immediately applicable to the problem of circle detection. 

The rest of this paper is organized as follows. We propose a generative model for the 
formation of edges within communities in Section [2] In Section [3] we derive an efficient 
model parameter learning strategy. In Section [4] we describe extensions to our model that 
allow it to be used in semi-supervised settings, in order to help users update and maintain 
their circles. We describe the datasets that we construct in Section[5j We give two schemes 
for automatically constructing parameterized user similarity function from profile data in 
Section [6} In Section [7] we show how to scale the model to large ego- networks. Finally in 
Section Is] we describe our evaluation and experimental results. 



2 A Generative Model for Friendships in Social Circles 

We desire a model of circle formation with the following properties: 

1. Nodes within circles should have common properties, or 'aspects'. 

2. Different circles should be formed by different aspects, e.g. one circle might be formed 
by family members, and another by students who attended the same university. 

3. Circles should be allowed to overlap, and 'stronger' circles should be allowed to form 
within 'weaker' ones, e.g. a circle of friends from the same degree program may form 
within a circle from the same university, as in Figure [T] 
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4. We would like to leverage both profile information and network structure in order to 
identify circles. 

5. Ideally we would like to be able to pinpoint which aspects of a profile caused a circle 
to form, so that the model is interpretable by the user. 

The input to our model is an ego- network G = {V,E), along with 'profiles' for each 
user V £ V. The 'center' node u of the ego-network (the 'ego') is not included in G, but 
rather G consists only of u's friends (the 'alters'). We define the ego-network in this way 
precisely because creators of circles do not themselves appear in their own circles. For each 
ego- network, our goal is to predict a set of circles C = {Ci . . . Gk}, Gk QV, and associated 
parameter vectors 9^ that encode how each circle emerged. We encode 'user profiles' into 
pairwise features 4>{x, y) that in some way capture what properties the users x and y have in 
common. We first describe our model, which can be applied using arbitrary feature vectors 
^(x,?/), and in Section [g] we develop several ways to construct feature vectors 4>{x,y) that 
are suited to our particular application. 

We describe a model of social circles that treats circle memberships as latent variables. 
Nodes within a common circle are given an opportunity to form an edge, which naturally 
leads to hierarchical and overlapping circles. We will then devise an unsupervised algorithm 
to jointly optimize the latent variables and the profile similarity parameters so as to best 
explain the observed network data. 

Our model of social circles is defined as follows. Given an ego-network G and a set of 
K circles C = {Gi . . . Gk}, we model the probability that a pair of nodes {x,y) £ V x V 
form an edge as 

p{{x,y) £ E) (xexpl ^ {(p{x,y),ek) - ^ Ok {(p{x,y),ek) > . (1) 

^ V ' ^ V ' 

circles containing both nodes all other circles 

For each circle C^, is the profile similarity parameter that we will learn. The idea is 
that {(j){x, y),Ok) is high if both nodes belong to Gk, and low if either of them do not. The 
parameter trades-off these two effects, i.e., it trades-off the influence of edges within G^ 
compared to edges outside of (or crossing) Gk- Since the feature vector (j){x, y) encodes the 
similarity between the profiles of two users x and y, the parameter vector Ok encodes which 
dimensions of profile similarity caused the circle to form, so that nodes within a circle Gk 
should 'look similar' according to Ok- Note that the pair (x,y) should be treated as an 
unordered pair in the case of an undirected network (e.g. Facebook), but should be treated 
as an ordered pair for directed networks (e.g. Google+ and Twitter). 

Considering that edges e = (x, y) are generated independently, we can write the prob- 
ability of G as 

Pe{G-C) = X{p{e£E)x\{p{eiE), (2) 

where = {{9k,ctk)}^^^"'^ is our set of model parameters. Defining the shorthand nota- 
tion 

4(e) = 5{e e Gk) - ak5{e ^ Gk), ^{e) = ^ 4(e) (0(e), 0fe) 
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allows us to write the log-likelihood of G: 

Ze(G;C) = ^<i>(e)- log (l + e*^^)) , (3) 

ee-E eeVxy 

where Z = (1 + e**-*^^) is a normalization constant. 

Next, we describe how to optimize node circle memberships C as well as the parameters 
of the user profile similarity functions = {{Ok, ctk)} {k = 1 . . . K) given a graph G and 
user profiles. 



3 Unsupervised Learning of Model Parameters 



Treating circles C as latent variables, we aim to find = {^, ol} so as to maximize the 
regularized log-likelihood of (eq. [3]), i.e.. 



G,C = argmax/e(G;C) - \^{9). 

e,c 



We solve this problem using coordinate ascent on and C ( MacKay , 2003 ) : 

C* = argmaxZet(G; C) 
c 

e*+i = argmax«e(G;C*) - AO(0). 
e 



(4) 

(5) 
(6) 



We optimize (eq.|6]) using L-BFGS, a standard quasi-Newton procedure to optimize smooth 
functions of many variables (Nocedal, 1980). Partial derivatives are given by 

=*(e) Qfl 

eeVxV 



dl 



dl 
dak 



e&E 

i Ck) {<Pie),ek) - E ^ C'fc) (0(e), , 



eGVxV 



(7) 
(8) 



For fixed C \ Ci we note that solving argmax,^ Je(G'; C \ Ci) can be expressed as 
pseudo-boolean optimization in a pairwise graphical model ( [Boros and Hammer 2002). 
'Pseudo-boolean optimization' refers to problems defined over boolean variables (in this 
case, whether or not a node is assigned to a particular community), where the variables 
being optimized are interdependent (in this case, relationships are defined over edges in a 
graph). In short, our optimization problem can be written in the form 



Ck = argmax ^ G C), <5(?/ G C7)). 



(9) 

(a:,j/)eVxy 

Although this problem class is NP-hard in general, efficient approximation algorithms are 
readily available (Rother et al. , 2007). In our setting, we want edges with high weight 
(under Qk) to appear in C^, and edges with low weight to appear outside of Ck- Defining 

Ok{e)= dk{e){He),Ok) 
Ck€C\Ci 
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the energy of (eq. [o]) is 

E^{0,0)^E^,iO,l)^E!:il,0) 



Okie) - ak (He), 9k} ~ log(l + e°'^-(^)-"^<'>(^)>«'=)), e e E 
-log(l + 6°*= ^'^f^^^^''^), e^E 

Okie) + (0(e), Ok) - log(l + e°''('=)+<"^('=)'^'=>), e e 
-log(l + e°'=('=)+<"^(^)'^'=>), e^E ' 



By expressing the problem in this form we can draw upon existing work on pseudo-boolean 



optimization. We use the publicly- available 'QPBO' software described in Rother et al 



(2007), which implements algorithms described in Hammer et al. (1984) and 



Kohli and 



Torr ( 2005 ) , and is able to accurately approximate problems of the form shown in (eq. ^ . 



Essentially, problems of the type shown in (eq. [o]) are reduced to maximum flow, where 
boolean labels for each node are recovered from their assignments to 'source' and 'sink' sets. 
Such algorithms have worst-case complexity 0(|A^|^), though the average case running-time 
is far better (Kolmogorov and Rother, 2007| ). We solve (eq. [o]) for each Ck in a random 
order. 

The two optimization steps of (eq. [5]) and (eq. [6]) are repeated until convergence, i.e., 
until C*"'"^ = C*. The entire procedure is presented in Algorithm [l] We regularize (eq. [4]) 
using the ii norm, i.e.. 



K l^fel 

EEi 

fc=l i=l 



'ki\ 



which leads to sparse (and readily interpretable) parameters. Our algorithm can readily 
handle all but the largest problem sizes typically observed in ego-networks: in the case of 



Facebook, the average ego-network has around 190 nodes (Ugander et al.l 2011), while the 



largest network we encountered has 4,964 nodes. Later, in Section [7[ we will exploit the 
fact that our features are binary, and that many nodes share similar features, to develop 
more efficient algorithms based on Markov Chain Monte Carlo inference. Note that since 
the method is unsupervised, inference is performed independently for each ego-network. 
This means that our method could be run on the full Facebook graph (for example), as 
circles are independently detected for each user, and the ego-networks typically contain 
only hundreds of nodes. In Section [4] we describe extensions that allow our model to be 
used in semi-supervised settings. 

3.1 Hyperparameter Estimation 

To choose the optimal number of circles, we choose K so as to minimize an approximation 
to the Bayesian Information Criterion (BIC), an idea seen in several works on probabilistic 



clustering (Airoldi et al., 2008 Handcock et al., 2007a Vohnsky and Raftery 2000). In 



this context, the Bayesian Information Criterion is defined as 

BIC{K;Q^) ~ -2/eK(G;C) + |e^|log|£;|. 



(10) 



where is the set of parameters predicted when there are K circles, and |0^| is the 
number of parameters (which increases linearly as K increases). We then choose K so as 
to minimize this objective: 



K = avgrnin BIC {K;e 
K 



(11) 
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ALGORITHM 1: Predict complete circles with hyperparamcters A, K. 



Data: ego-network G = (V, E), edge features 0(e) : E — M^, hyperparamcters A, K 
Result: parameters 9 :— {(fffe, {Sfc)}*^^^- -^, communities C 
initialize Ol e {0, 1}^, al := 1, Ck 0, t := 0; 
repeat 

for fc e do 

Cl := argmaxcE(:.,y)evxy^Ly)('^(a; e C),(5(j/ e C)); 

// using QPBD, see (eq. jgl) 

end 

e*+i := argmaxQ ?e(G;C*) - Af|(6i); 
// using L-BFGS, see (eqs. [t] and [sj) 
t:=t+l- 
until C*+i =C*; 



In other words, an additional circle will only be added to the model if doing so has a 
'significant' impact on the log- likelihood. 

The regularization parameter A G {0,1,10,100} was determined using leave-one-out 
cross validation, though in our experience did not significantly impact performance. 



4 Extensions 

So far, we have considered the 'cold-start' problem of predicting complete sets of circles 
using nothing but node attributes and edge information. In other words, we have treated 
circle prediction as an unsupervised task. This setting is realistic if users construct their 
circles only after their ego-networks have already been defined. On the other hand, in 
settings where users build their circles incrementally, it is less likely that we would wish 
to predict complete circles 'from scratch'. We note that both settings occur in the three 
social networks that we consider. 

In this section, we describe techniques to exploit partially observed circle information 
to help users update and maintain their circles. In other words, we would like to apply 



our model to users' personal networks as they change and evolve (Backstrom et al. 2006). 
Since our model is probabilistic, it is straightforward to adapt it to make use of partially 
observed data, by conditioning on the assignments of some of the latent variables in our 
model. In this way, we adapt our model for semi-supervised settings in which a user labels 
some or all of the members of their circles. Later, in Section [7j we describe modifications 
of our model that allow it to be applied to extremely large networks, by exploiting the fact 
that many users assigned to common circles also have common features. 



4.1 Circle Maintenance 

First we deal with the problem of a user adding new friends to an established ego-network, 
whose circles have already been defined. Thus, given a complete set of circles, our goal 
is to predict community memberships for a new node, based on that node's features, and 
their patterns of connectivity to existing nodes in the ego-network. 

Since circles in this setting are fully-observed, we simply fit the model parameters that 
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best explain the ground-truth circles C provided by the user: 



e = argmaxle{G;C)-XQ{e). (12) 
e 

As with (eq. [6| this is solved using L-BFGS, though optimization is significantly faster in 
this case as there are no longer latent community memberships to infer, and thus coordinate 
ascent is not required. 

Next, we must predict to which of the K ground-truth circles a new user u belongs. 
That is, we must predict c" E {0, l}''^, where each is a binary variable indicating whether 
the user u should belong to the circle C^. In practice, for the sake of evaluation, we shall 
suppress a single user from G and C, and try to recover their memberships. 

This can be done by choosing the assignment c" that maximizes the log-likelihood of 
C once u is added to the graph. We define the augmented community memberships as 
C+ = {C+}^=i-^, where 

+ _ r CfcUM, 4 = 1 

The updated community memberships are then chosen according to 

C+ = argmax/A(GU{n};C+). (14) 

The above expression can be computed efficiently for different values of by noting that 
the log-likelihood only changes for terms including u, meaning that we need to compute 
p{{x,y) G E) only ii x = u oi y = u. In other words, we only need to consider how the 
new user relates to existing users, rather than considering how existing users relate to each 
other; thus computing the log-likelihood requires linear (rather than quadratic) time. To 
find the optimal we can simply enumerate all 2^ possibilities, which is feasible so long 
as the user has no more than ~ 20 circles. For users with more circles we must resort 
to an iterative update scheme as we did in Section [S] 

4.2 Semi-Supervised Circle Prediction 

Next, we consider the problem of using weak supervision in the form of 'seed nodes' to 



assist in circle prediction (Andersen and Lang, 2006). In this setting, the user manually 



labels a few users from each of the circles they want to create, say {si . . . sk}- Our goal 
is then to predict K circles C = {Ci . . . Ck} subject to the constraint that Sk C Ck for all 
ke{l...K}. 

Again, since our model is probabilistic, this can be done by conditioning on the as- 
signments of some of the latent variables. That is, we simply optimize Iq{G;C) subject 
to the constraint that Sk C Ck for all k G {1 . . . K}. In the parlance of graphical models, 
this means that rather than treating the seed nodes as latent variables to be predicted, we 
treat them as evidence on which we condition. We could also include negative evidence 
(i.e., the user could provide labels for users who do not belong to each circle), or we could 
have users provide additional labels interactively, though the setting described is the most 
similar to what is used in practice. 
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Figure 2: Histogram of overlap between circles (on Facebook). A value of zero indicates 
that the circle does not intersect with any of the user's other circles, whereas a value of one 
indicates that a circle is entirely contained within another. Approximately 25% of circles 
exhibit the latter behavior. 

5 Dataset Description 

Our goal is to evaluate our method on ground-truth data. We expended significant time, 
effort, and resources to obtain high quality hand-labeled data, which we have made available 
online]^ We were able to obtain ego-networks and ground-truth from three major social 
networking sites: Facebook, Google-|-, and Twitter. 

From Facebook we obtained profile and network data from 10 ego-networks, consisting 
of 193 circles and 4,039 users. To obtain circle information we developed our own Facebook 
application and conducted a survey of ten users, who were asked to manually identify all 
of the circles to which their friends belonged. It took each user between 2 and 3 hours 
to label their entire network. On average, users identified 19 circles in their ego-networks, 
with an average circle size of 22 friends. Examples of circles we obtained include students 
of common universities and classes, sports teams, relatives, etc. 

Figure [2] shows the extent to which our 193 user- labeled circles in 10 ego networks from 
Facebook overlap (intersect) with each other. Around one quarter of the identified circles 
are independent of any other circle, though a similar fraction are completely contained 
within another circle (e.g. friends who studied under the same adviser may be a subset 
of friends from the same university) . The remaining 50% of communities overlap to some 
extent with another circle. 

For the other two datasets we obtained publicly accessible data. From Google+ we 
obtained data from 133 ego-networks, consisting of 479 circles and 106,674 users. The 133 
ego- networks represent all 133 Google+ users who had shared at least two circles, and 
whose network information was publicly accessible at the time of our crawl. The Google+ 
circles are quite different to those from Facebook, in the sense that their creators have 
chosen to release them publicly, and because Google+ is a directed network (note that our 
model can very naturally be applied to both to directed and undirected networks). For 
example, one circle contains candidates from the 2012 republican primary, who presumably 
do not follow their followers, nor each other. Finally, from Twitter we obtained data from 



1,000 ego-networks, consisting of 4,869 circles (or 'lists' (Kim et al. , 2010 Nasirifard and 



^ http : //snap . Stanford . edu/ data/ 
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Hayesj [20lT| | Wu et al. [ [20Tl| [Zhaol [20TT| ) ) and 81,362 users. The ego-networks we obtained 



range in size from 10 to 4,964 nodes. 

Taken together our data contains 1,143 different ego-networks, 5,541 circles, and 192,075 
users. The size differences between these datasets simply reflects the availability of data 
from each of the three sources. Our Facebook data is fully labeled, in the sense that we 
obtain every circle that a user considers to be a cohesive community, whereas our Google+ 
and Twitter data is only partially labeled, in the sense that we only have access to public 
circles. We design our evaluation procedure in Section [8] so that partial labels cause no 
issues. 



6 Constructing Features from User Profiles 

Profile information in all of our datasets can be represented as a tree where each level 
encodes increasingly specific information (Figure |3| left). In other words, user profiles are 
organized into increasingly specific categories. For example, a user's profile might have a 
education category, which would be further separated into categories such as name, location, 
and type. The leaves of the tree are then specific values in these categories, e.g. Princeton, 
Cambridge, and Graduate School. Several works deal with automatically building features 



from tree-structured data (Haussler, 1999 Vishwanathan and Smola, 2002), but in order 



to understand the relationship between circles and user profile information, we shall design 
our own feature representation scheme. 

We propose two hypotheses for how users organize their social circles: either they may 
form circles around users who share some common property with each other, or they may 
form circles around users who share some common property with themselves. For example, 
if a user has many friends who attended Stanford, then they may form a 'Stanford' circle. 
On the other hand, if they themselves did not attend Stanford, they may not consider 
attendance to Stanford to be a salient feature. The feature construction schemes we propose 
allow us to assess which of these hypotheses better represents the data we obtain. 

From Google+ we collect data from six categories (gender, last name, job titles, insti- 
tutions, universities, and places lived). From Facebook we collect data from 26 categories, 
including users' hometowns, birthdays, colleagues, political and religious affiliations, etc. 
As a proxy for profile data, from Twitter we collect data from two categories, namely 
the set of hashtags and mentions used by each user during two-weeks' worth of tweets. 
'Categories' correspond to parents of leaf nodes in a profile tree, as shown in Figure [3] 

We first propose a difference vector to encode the relationship between two profiles. 
A non-technical description is given in Figure [3} Essentially, we want to encode those 
dimensions where two users are the same (e.g. Alan and Dilly went to the same graduate 
school), and those where they are different (e.g. they do not have the same surname). 
Suppose that users v £ V each have an associated profile tree T^, and that / G T^, is a leaf 
in that tree. We define the difference vector ax,y between two users x and y as a binary 
indicator encoding the profile aspects where users x and y differ (Figure [3| bottom left): 

<^x,y[l] = m(^T,)^{l£Ty)). (15) 

Note that feature descriptors are defined per ego-network: while many thousands of high 
schools (for example) exist among all Facebook users, only a small number appear among 
any particular user's friends. 
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Although the above difference vector has the advantage that it encodes profile infor- 
mation at a fine granularity, it has the disadvantage that it is high-dimensional (up to 
4,122 dimensions in the data we considered). One way to address this is to form difference 
vectors based on the parents of leaf nodes: this way, we encode what profile categories two 
users have in common, but disregard specific values (Figure [sj bottom right). For example, 
we encode how many hashtags two users tweeted in common, but discard which hashtags 
they tweeted: 

'^x,y\p\ = Y^lechildren{p)'^x,y[l]- (16) 

This scheme has the advantage that it requires a constant number of dimensions, regardless 
of the size of the ego-network (26 for Facebook, 6 for Google+, 2 for Twitter, as described 
above) . 

Based on the difference vectors ax,y (and cr'^^y) we now describe how to construct edge 
features (j){x,y). The first property we wish to model is that members of circles should 
have common relationships with each other: 

(l)\x,y) = {l;-a.,y). (17) 

The second property we wish to model is that members of circles should have common 
relationships to the ego of the ego-network. In this case, we consider the profile tree Tu 
from the ego user u. We then define our features in terms of that user: 

<j)^{x,y) = {l;-\arc^u - cry,u\) (18) 

(|ca;,n — '^y,u\ IS taken elementwise) . These two parameterizations allow us to assess which 
mechanism better captures users' subjective definition of a circle. In both cases, we include 
a constant feature ('1'), which controls the probability that edges form within circles, or 
equivalently it measures the extent to which circles are made up of friends. Importantly, 
this allows us to predict memberships even for users who have no profile information, simply 
due to their patterns of connectivity. 

Similarly, for the 'compressed' difference vector o-'^^y, we define 

i;^{x,y) = {l;-a'-,^y), V^(a:, y) = (1; -|o-^^„ - c7^_„|). (19) 

To summarize, we have identified four ways of representing the compatibility between dif- 
ferent aspects of profiles for two users. We considered two ways of constructing a difference 
vector (ax^y vs. cr'^^y) and two ways of capturing the compatibility between a pair of profiles 
(0(x,y) vs. il){x,y)). The features are designed to model the following behavior: 

1. Ego users build circles around common relationships between their friends {(j)^, V^) 

2. Ego users build circles around common relationships between their friends and them- 
selves {(fP', -0^) 

In our experiments we assess which of these assumptions is more realistic in practice. 
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7 Fast Inference in Large Ego-Networks 



Although our algorithm is able to handle the problem sizes typically encountered in ego- 
networks (i.e., fewer than 1,000 friends), scalability to larger networks presents an issue, 
as we require quadratic memory to encode the compatibility between every pair of nodes 
(an issue we note is also present in the existing approaches we consider in Section [s]). In 
this section, we propose a more scalable alternative that makes use of the fact that many 
nodes belonging to common communities also share common features. 

Noting that features cp^ and cj)'^ described in Section [g] are binary valued, as are com- 
munity memberships, if there are K communities and F-dimensional features, there can 
be at most 2^~^^ 'types' of node. In other words, every node's community membership is 
drawn from {0, 1}^, and every node's feature vector is drawn from {0, 1}^, so there are 
at most 2^^^ distinct community/feature combinations. Of course the number of distinct 
node types is also bounded by \V\, the number of nodes in the graph. 

In practice, however, the number of distinct node 'types' is much smaller, as nodes 
belonging to common communities tend to have common features. Community member- 
ships are also not independent: in Figure [2] we observed both disjoint and hierarchically 
nested communities, which means that of the 2^ possible community memberships, only 
a fraction of them occur in practice. 



In this section, we propose a Markov-Chain Monte Carlo (MCMC) sampler (Newman 



and Barkema, 1999) which efficiently updates node-community memberships by 'collapsing' 
nodes that have common features and community memberships. Note that the adaptations 
to be described can be applied to any types of feature (i.e., not just binary features), all 
we require is that many users share the same features; we assume binary features merely 
for the sake of presentation. 

We start by representing each node using binary strings that encode both its community 
memberships and its features. Each node's community memberships are represented using 
5 : y ^ S^, such that 

'5'(2;)[^] "I otherwise ' ^^^■^ 

Similarly, each node's features are represented using the binary string Q, which, since our 
features are already binary, is simply the concatenation of the feature dimensions. 

We now say that the 'type' of a node x is the concatenation of its community string 
and its feature string, {S{x);Q{x)), and we build a (sparse) table types : x T,^ — N 
that counts how many nodes exist of each type. 

In our setting, MCMC consists of repeatedly updating the (binary) label of each node 
in a particular community. Specifically, if the marginal (log) probability that a node x 
belongs to a community k is given by then the node's new label is chosen by sampling 
z i—V({0,l), and updating 

cr.u.i_/ 1> if ^ <exp {1(^^(1) -£^(0))} 

- I 0^ otherwise ' ^^^^ 

where T is a temperature parameter that decreases at each iteration, so that we are more 
likely to choose the label with higher probability as the model 'cools'. 

Computing £^{0) and ^J;(l) (the probability that node x takes the label or 1 in 
community k) requires computing p{{x,y) E E) for all y €z V. However, we note that if 
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two nodes y and y' have the same type (i.e., they belong to the same communities and 
have the same features), then p{{x, y) £ E) = p{{x, y') £ E). In order to maximize the log- 
hkeUhood of the observed data, we must also consider whether (x, y) and {x,y') are actually 
edges in the graph. To do so, we first compute ^^(0) and ^^(1) under the assumption that 
no edges are incident on x, after which we correct for those edges incident on x. Thus the 
running time of a single update is linear in the number of distinct node types, plus the 
average node degree, both of which are bounded by the number of nodes. 
The entire procedure is demonstrated in Algorithm [2} 



ALGORITHM 2: Update memberships node x and circle k. 
Data: node x whose membership to circle Ck is to be updated 
Result: updated membership for node x 
initialize ^^(0) := 0, £^{1) := 0; 

construct a dummy node Xq with the communities and features of x but with x ^ Ck', 
construct a dummy node xi with the communities and features of x but with a; € Ck', 
for (c, /) € dom{types) do 

lie — community string, /= feature string 

n := types{c,f)', 

II n = number of nodes of this type 
if S{x) ^cAQ{x) = f then 

// avoid including a self -loop on x 

n := n — 1; 
end 

construct a dummy node y with community memberships c and features /; 

// first compute probabilities assuming all pairs {x,y) are non-edges 

e^O) := £^M + nlogpaxo,y)iE); 
iUl) £'i{l) + n\ogp{ix,,y)^E)- 
end 

for {x, y) E E do 

II correct for edges incident on x 

4(0) := £^(0) - logp((xo, y)(^E)+ logpiixo, y) e E); 
£j(l) := eta) - ^ogp{{x^,y) i E) + \ogp{{x^,y) G E)', 
end 

// update membership to circle k 

types{S{x),Q{x)) := types{S{x),Q{x)) - 1; 

if z< exp {T(£^ (1) - (0))} then 

I Six)[k]'.= l 
else 

I S{x)[k]:=0 
end 

types{S{x),Q{x)) := types{S{x),Q{x)) + 1; 



We also exploit the same observation when computing partial derivatives of the log- 
likelihood, that is we first efficiently compute derivatives under the assumption that the 
graph contains no edges, and then correct the result by summing over all edges in E. 
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8 Experiments 



We first describe tlie evaluation metrics to be used in Sections 18.11 and 18.21 before de- 



scribing the baselines to be evaluated in Section 8.3 We describe the performance of our 



(unsupervised) algorithm in Section 8.4, and extensions in Sections 8.6, 8.7, and 8.8 



8.1 Evaluation metrics 

Although our method is unsupervised, we can evaluate it on ground-truth data by ex- 
amining the maximum-likelihood assignments of the latent circles C = {Ci . . . Ck} after 
convergence. Our goal is that for a properly regularized model, the latent circles will align 
closely with the human labeled ground-truth circles C = {Ci . . . Cj^}. 

To measure the alignment between a predicted circle C and a ground-truth circle C, we 



compute the Balanced Error Rate (BER) between the two circles (Chen and Lin 2006), 



BER{C, C) 



c\c\ \c\c\ 



(22) 



This measure assigns equal importance to false positives and false negatives, so that trivial 
or random predictions incur an error of 0.5 on average. Such a measure is preferable to the 
0/1 loss (for example), which assigns extremely low error to trivial predictions. We also 
report the Fi score, which we find produces qualitatively similar results. 



8.2 Aligning predicted and ground-truth circles 

Since we do not know the correspondence between circles in C and C, we compute the 
optimal match via linear assignment by maximizing: 



1 

max — 

/:C->C I/I 



il-BERiCfiC))), 



(23) 



Cedom(/) 



where / is a (partial) correspondence between C and C. That is, if the number of predicted 
circles |C| is less than the number of ground-truth circles |C|, then every circle C G C 
must have a match C E C, but if |C| > |C|, we do not incur a penalty for additional 
predictions that could have been circles but were not included in the ground-truth. We 
use established techniques to estimate the number of circles, so that none of the baselines 
suffers a disadvantage by mispredicting K = \C\. 

In the case of Facebook (where we have 'complete' ground-truth, in the sense that 
survey participants ostensibly label every circle), our method ought to penalize predicted 
circles that do not appear in the ground-truth. A simple penalty would be to assign an 
error of 0.5 (i.e., that of a random prediction) to additional circles in the case of Facebook. 
However, in our experience, our method did not overpredict the number of circles in the case 
of Facebook: on average, users identified 19 circles, whereas using the Bayesian Information 



Criterion described in Section 3.1, our method never predicted K > 10. In practice this 



means that in the case of Facebook, we always penalize all predictions. Again we note that 
the process of choosing the number of circles using the BIC is a standard procedure from 



the literature (Airoldi et al, 2008: Handcock et al., 2007a Vohnsky and Raftery, 2000) 



whose merit we do not assess in this paper. 
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Network Modularity. Although for our algorithm, and other probabilistic baselines, 
we shall choose the number of communities using the Bayesian Information Criterion as 



described in Section 3.1 , another standard criterion used to determine the number of com- 



munities in a network is the modularity (Newman, 2006). 



The Bayesian Information Criterion has the advantage that it allows for overlapping 
communities, whereas the modularity does not (i.e., it assumes all communities are dis- 
joint); it is for this reason that we chose the BIC to choose K for our algorithm. On 
the other hand, the Bayesian Information Criterion can only be computed for probabilistic 
models (i.e., models that associate a likelihood with each prediction), whereas the modu- 
larity has no such restriction. For this reason, we shall use the modularity to choose K for 
non-probabilistic baselines. 

The modularity essentially measures the extent to which clusters in a network have 



dense internal, but sparse external, connections (Newman, 2003). If eij is the fraction of 



edges in the network that connect vertices in Ci to vertices in Cj, then the modularity is 
defined as 

K ( K 

Q(K) = ^^e,,-^e,, (24) 



We then choose K so that the modularity is maximized. 



8.3 Baselines 

We considered a wide number of baseline methods, including those that consider only 
network structure, those that consider only profile information, and those that consider 
both. 

Mixed Membership Stochastic Block Models. (Airoldi et al. 2008). This method 



detects communities based only on graph structure; the output is a stochastic vector for 
each node encoding partial memberships to each community. The optimal number of 
communities K is determined using the Bayesian Information Criterion as described in 



(eq. 11 ). This model is similar to those of (Liu et al. , 2009) and (Chang and Blei, 2009), the 
latter of which includes the implementation of MMSB that we used. Since we require 'hard' 
memberships for evaluation, we assign a node to a community if its partial membership to 
that community is positive. 



Block-LDA. ( Balasubramanyan and Cohen, 2011 ). This method is similar MMSB, except 



that it allows nodes to be augmented with side information in the form of 'documents'. 
For our purposes, we generate 'documents' by treating aspects of user profiles as words in 
a bag-of- words model. 



K-means clustering. (MacKay, 2003). Just as MMSB uses only the graph structure, K- 



means clustering ignores the graph structure and uses only node features (for node features 
we again use a bag-of- words model). Here, we choose K so as to maximize the modularity 
of C, as defined in (eq. [24]). 

Hierarchical Clustering. (Johnson 1967). This method builds a hierarchy of clusters. 



Like K-means, this method form clusters based only on node profiles, but ignores the 
network. 



Link Clustering. (Ahn et al. , 2010). Conversely, this method uses network structure. 



but ignores node features to construct hierarchical communities in networks. 
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Table 1: Baselines 



Algorithm 


network 


1/1 

node/ edge 


overlapping 


hard member- 




structure? 


feciturcs? 


communities? 


ships? 


MMSB 


Yes 


No 


Yes 


No 


Block-LDA 


Yes 


Yes 


Yes 


No 


K-means 


No 


Yes 


No 


Yes 


Hierarchical Clustering 


No 


Yes 


No 


Yes 


Link Clustering 


Yes 


No 


No 


Yes 


Clique Percolation 


Yes 


No 


Yes 


Yes 


Low-Rank Embedding 


Yes 


Yes 


No 


Yes 


Multi- Assignment Clustering 


No 


Yes 


Yes 


Yes 


Our algorithm 


Yes 


Yes 


Yes 


Yes 



Clique Percolation. (Palla et al. , 2005). This method also uses only network structure, 



and builds communities from the union of small, densely-connected sub-communities. 



Low- Rank Embedding. (Yoshida, 2010). Uses both graph structure and node similarity 
information, but does not perform any learning. We adapt an algorithm described by 



(Yoshida, 2010), where node similarities are based on the cosine distance between profile 
bags-of-words. After our features are embedded into a low-dimensional space, we again 
use K-means clustering to detect communities, again choosing K so as to maximize the 
modularity. 

Multi- Assignment Clustering. 



(Streich et al. , 2009). Like ours, this method predicts 



hard assignments to multiple clusters, though it does so without using the network struc- 
ture. 

The above methods (and our own) are summarized in Table [Tj Of the eight baselines 
highlighted above we report the three whose overall performance was the best, namely 
Block-LDA ( Balasubramanyan and Cohen, 2011 ) (which slightly outperformed mixed mem- 



bership stochastic block models (Airoldi et al. 



2010[ ), and Multi- Assignment Clustering ( [Streich et aH|2009[ ). 



2008)), Low-Rank Embedding (Yoshida 



8.4 Performance on Facebook, Google+, and Twitter Data 

Figure [4] shows results on our Facebook, Coogle-|-, and Twitter data. The largest circles 
from Google-|- were excluded as they exhausted the memory requirements of many of the 



baseline algorithms. Circles were aligned as described in (eq. 23 ), with the number of circles 
K determined as described in Section ^ For non- probabilistic baselines, we chose K so 



as to maximize the modularity, as described in (eq. 24). In terms of absolute performance 
our best model (j)^ achieves BER scores of 0.84 on Facebook, 0.72 on Coogle-I- and 0.70 on 
Twitter {Fi scores are 0.59, 0.38, and 0.34, respectively). The lower Fi scores on Google-l- 
and Twitter are explained by the fact that many circles have not been maintained since 
they were initially created: we achieve high recall (we recover the friends in each circle), but 
at low precision (we recover additional friends who appeared after the circle was created). 

Comparing our method to baselines we notice that we outperform all baselines on all 
datasets by a statistically significant margin. Compared to the nearest competitors, our 
best performing features (p^ improve on the BER by 43% on Facebook, 26% on Google-|-, 
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and 16% on Twitter (improvements in terms of the Fi score are similar). Regarding the 
performance of the basehne methods, we note that good performance seems to depend 
criticahy on predicting hard memberships to multiple circles, using a combination of node 
and edge information; none of the baselines from Table [T] exhibit precisely this combination, 
a shortcoming our model addresses. 

Both of the features we propose (friend-to- friend features (j)^ and friend-to- user features 
cp'^) perform similarly, revealing that both schemes ultimately encode similar information, 
which is not surprising, since users and their friends have similar profiles. Using the 'com- 
pressed' features and V'^ does not significantly impact performance, which is promising 
since they have far lower dimension than the full features; what this reveals is that it is 
sufficient to model categories of attributes that users have in common (e.g. same school, 
same town), rather than the attribute values themselves. 

We found that all algorithms perform significantly better on Facebook than on Google-I- 
or Twitter. There are a few explanations: Firstly, our Facebook data is complete, in the 
sense that survey participants manually labeled every circle in their ego-networks, whereas 
in other datasets we only observe publicly-visible circles, which may not be up-to-date. 
Secondly, the 26 profile categories available from Facebook are more informative than the 
6 categories from Google-|-, or the tweet-based profiles we built from Twitter. A more basic 
difference lies in the nature of the networks themselves: edges in Facebook encode mutual 
ties, whereas edges in Google-I- and Twitter encode follower relationships, which changes 
the role that circles serve (Wu et al. , 2011). The latter two points explain why algorithms 
that use either edge or profile information in isolation are unlikely to perform well on this 
data. 



8.5 Qualitative Analysis 

Next we examine the output of our model in greater detail. Figure [5] shows results of our 
unsupervised method on example ego- networks from Facebook and Google+. Different 
colors indicate true-, false- positives and negatives. Our method is correctly able to identify 
overlapping circles as well as sub-circles (circles within circles). 

Figure [6] shows parameter vectors learned for four circles for a particular Facebook 
user. Positive weights indicate properties that users in a particular circle have in common. 
Notice how the model naturally learns the social dimensions that lead to a social circle. 
Moreover, the first parameter that corresponds to a constant feature '1' has the highest 
weight; this reveals that membership to the same community provides the strongest signal 
that edges will form, while profile data provides a weaker (but still relevant) signal. 



8.6 Circle Maintenance 

Next we examine the problem of adding new users to already-defined ego-networks, in 
which complete circles have already been provided. For evaluation, we suppress a single 
user u from a user's ego-network, and learn the model parameters Q that best fit G \ {u} 
and C\{u}. Our goal is then to recover the set of communities to which the node u belongs, 
as described in Section 4.1 Again we report the Balanced Error Rate and Fi score between 
the ground-truth and the predicted set of community memberships for u. We use all of 
each users' circles for training, up to a maximum of fifteen circles. This experiment is 
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repeated for 10 random choices of the user u for each ego-network in our dataset. 

As a baseline we compare the performance of our algorithm to that of a fully-supervised 
Support Vector Machine (SVM) model. For each community C^, we train a binary classi- 
fier that discriminates members from non-members based on their node features. Binary 
classifications are then made for each community independently. 

Performance on this task is shown in Figure [7] On Facebook, Google+, and Twitter 
our best performing features (f)^ achieve Balanced Error Rates of 0.30, 0.34, and 0.34 
(respectively), and Fi scores of 0.38, 0.59, and 0.54. The SVM model achieves better 
accuracy when rich node features are available (which is the case for Facebook), though it 
fails to make use of edge information, and does not account for interdependencies between 
circles. This proves critical in the case of Google+ and Twitter, where node information 
alone proves uninformative. 



8.7 Semi-Supervised Circle Prediction 

Our next task is to identify circles using a form of weak supervision provided by the user, 
in the form of seed nodes as described in Section |4.2[ In this setting, the user provides S 
seed nodes for each of K circles that they wish to identify. For evaluation, we select the K 
circles to be identified and the S seed nodes uniformly at random. 

Without seed nodes (as in our initial experiments), the circles that are automatically 
identified by our algorithm may be quite different from those identified once seed nodes are 
added. Similarly, there may be many circles containing the same seed nodes, meaning that 
different solutions may be chosen for different values of S. Thus it is difficult to compare 



the loss of (eq. 23) with and without seed nodes. To address this, we modify the matching 



objective of (eq. 23) so that the K circles randomly selected for seeding must be the same 
as those matched when evaluating the loss. Thus the loss is always evaluated on the same 
K circles for every number of seed nodes S G {0 . . . 10}. Note also that for each value of K, 
performance is only evaluated on those ego- networks with at least K ground-truth circles. 

Figure |8] shows the performance of our algorithm for different numbers of seed nodes 
S £ {0 . . . 10} and different numbers of circles K G {1 ... 5}. The same results in terms 
of the Fi score are qualitatively similar and are omitted for brevity. We find that for all 
values of K, adding seed nodes increases the accuracy significantly, though the effect is 
most pronounced when the number of circles that the user wishes to identify is small. 

Curiously, we find that while larger values of K lead to better prediction when there 
are no seeds, the opposite is true when there are many seeds. The former behavior may be 
explained by the simple fact that larger values of K are better able to fit the data, though 
the latter behavior is more enigmatic. Pleasingly, assuming that a user wishes to identify 
only a small number of circles at a time, then they can do so with very few seeds: for small 
K, most of the benefit is gained once only two or three seeds are provided. 



8.8 Scalability Analysis 

Figure [9] examines how our algorithm scales with the size of an ego-network. Here we use 
the Markov-Chain Monte-Carlo (MCMC) version of our algorithm described in Section [7] 
Figure [9] shows the total time taken to predict different numbers of circles in differently 
sized ego-networks. Since the performance of our MCMC algorithm is a function of the 
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number of circles K and the feature dimensionality F, we fix the feature dimensionality 
at F = 10 for all ego-networks, using the ten most common features that appear in each 
ego-network using the 'friend-to- friend' features (j)^. 

For comparison, Figure [9] shows the running time of inference using QPBO as described 
in Section [3} Although the two algorithms are competitive for up to a few hundred nodes, 
the QPBO algorithm becomes intractable for networks of around 1000 nodes, since it 
requires us to optimize a probability distribution defined on complete graphs (in practice, 
in order to apply the QPBO algorithm in the previous experiments, we did not construct 
complete graphs, but rather included only those edges whose influence on the likelihood 
was maximal). 

Although this version of the algorithm is not particularly efficient for small networks 
(identifying K = circles on an ego network with 1000 nodes requires around one hour), 
it has the advantage that it is easily able to scale to the largest ego-networks that are ever 
encountered. For very large networks, the algorithm is able to take advantage of the fact 
that many nodes with the same features and community memberships can be 'collapsed', so 
that the running time increases only modestly between 2500 and 5000 node ego- networks. 

Figure 10 shows the accuracy of our MCMC algorithm in terms of the Balanced Error 
Rate and Fi score. We note that the best performance of our algorithm is obtained on 
reasonably small ego-networks, though in practice small networks account for the vast 
majority of our data. Note that the results for any particular value of K are slightly worse 
than those reported in Figure [4j since we are not selecting K using the BIG described in 
Section 3.1 Although performance clearly degrades for large ego-networks, it remains an 
open question whether this is due to the difficulty of optimization on large networks, or 
simply due to the fact that our model assumptions become increasingly violated as large 
networks become less 'community- like'. 



9 Discussion and Future Work 

We have modeled circle detection as a problem that can be solved independently for each 
user. In practice this assumption is advantageous, as it allows us to deal with several 
small problems independently, using sophisticated models that could not easily scale to 
networks with millions of nodes. However, it is possible that circles could be more ac- 
curately predicted by exploiting relationships between the circles of multiple users. For 
example, if a user has a 'Stanford' circle in their ego-network, it is highly likely that users 
belonging to that circle will also have Stanford circles within their own ego-networks. Al- 
ternately, if a Stanford community could be detected across the entire Facebook, Google+, 
or Twitter network, then a user's 'Stanford' circle might simply be the intersection of their 
ego-network with that community. Although studying such models is an appealing avenue 
for future work, it is unfortunately not possible using our data, where we do not have 
access to complete network information. 

Although we developed algorithms that scale to the largest ego-networks that we en- 
countered, we find that the best performance occurs on ego-networks with up to a few 
hundred nodes, but degrades significantly for networks with more than 1000. It remains to 
be seen whether this is a shortcoming of our algorithm (due to the fact that optimization 
is more difficult for large networks) , or whether the assumptions of our model simply break 
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down at large scales. Our fundamental assumption that circles will be made up of close- 
knit groups of friends with common properties seems like a better fit to networks with at 
most a few hundred nodes. 

We also found that performance on even the largest Facebook networks (i.e., over 1000 
friends) was better than that obtained on small networks from Google+ and Twitter. This 
suggests that it is not merely the size of the networks that causes our model assumptions to 
become violated, but rather the very nature of the networks themselves (in addition to the 
differences in the ground-truth already mentioned). Naturally, a circle containing members 
of the same squash team (as we find on Facebook) is fundamentally different from a circle 
containing presidential candidates (as we find on Google+). It remains to design a circle 
detection algorithm that is tailored for networks with asymmetric following relationships. 

10 Conclusion 

'Circles' allow us to organize the overwhelming volumes of data generated by our personal 
social networks, though they are laborious to construct manually. We have designed an 
algorithm to automatically detect circles in ego-networks, which we evaluated on a dataset 
of 1,143 ego-networks and 5,541 ground-truth circles obtained from Facebook, Google-|-, 
and Twitter. We find in such data circles that are disjoint, overlapping, and hierarchically 
nested, and design our model with such behavior in mind. Our model is unsupervised, but 
can also make use of weakly-labeled data that may be available in practice. Experiments 
reveal that social circles can be accurately detected using a combination of both network 
and profile information. 
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Figure 3: Feature construction. Profiles are tree-structured, and we construct features by 
comparing paths in tliose trees. Examples of trees for two users x (blue) and y (pink) are 
shown at top. Two schemes for constructing feature vectors from these profiles are shown at 
bottom: (1) (bottom left) we construct binary indicators measuring the difference between 
leaves in the two trees, e.g. 'work— )-position— )-Cryptanalyst' appears in both trees. (2) 
(bottom right) we sum over the leaf nodes in the first scheme, maintaining the fact that 
the two users worked at the same institution, but discarding the identity of that institution. 
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Figure 4: Performance on Facebook, Google+, and Twitter, in terms of the Balanced Error 
Rate (top), and the Fi score (bottom). Higher is better. Error bars show standard error. 
The improvement of our best features (p^ compared to the nearest competitor are significant 
at the 1% level or better. 
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Figure 5: Top: Three detected circles on a small ego-network from Facebook, compared to 
three ground-truth circles (BER~0.81). Blue nodes: true positives. Grey: true negatives. 
Red: false positives. Yellow: false negatives. Our method correctly identifies the largest 
circle (left), a sub-circle contained within it (center), and a third circle that significantly 
overlaps with it (right). Bottom: Four detected circles on ego-networks from Google+ 
(BER ~ 0.73). Green nodes in the two right networks show additional detected circles, 
whose accuracy cannot be evaluated as we only observed two circles in the ground-truth. 
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Figure 6: Parameter vectors of four communities for a particular Facebook user. The 
top four plots show 'complete' features cp^, while the bottom four plots show 'compressed' 
features (in both cases, BER ~ 0.78). For example the former features encode the fact 
that members of a particular community tend to speak German, while the latter features 
encode the fact that they speak the same language. (Personally identifiable annotations 
have been suppressed.) 
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Figure 7: Accuracy of assigning a new node to already-existing circles. Although a fully- 
supervised Support Vector Machine gives accurate results on Facebook (where node fea- 
tures are highly informative), our model yields far better results on Google+ and Twitter 
data. 
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Figure 8: Number of seeds versus accuracy (1 - Balanced Error Rate) for different numbers 
of circles K. For each of the K circles being identified, the user provides the same number 
of seeds. Although providing additional seeds is generally beneficial to performance for all 
K, the benefit is most pronounced when the number of circles to be identified is small. 
Results in terms of the Fi score are qualitatively similar and are omitted for brevity. 
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Figure 9: Running time of our Markov Chain Monte Carlo (MCMC) algorithm for different 
ego- network sizes and different values of K (the number of circles to be detected). For 
comparison, our previously described inference algorithm (based on QPBO ( [Rother et al. 
2007D ) is shown for K = 10. 
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Figure 10: Accuracy of our Markov Chain Monte Carlo (MCMC) algorithm, in terms of 
the Balanced Error Rate (top), and the Fi score (bottom). 
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